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2. Miéu ta tng quan 


Doc bao 24h la tng dung tong hop tin tlc tle ~70 dau bao online 
hang dau. Hé théng sé tu déng lay vé nhitng bai bao mdi, cac bai 
bao sé duoc hién thi theo thoi gian publish bai bao goc. 





Ban than Doc bao 24h sé duoc coi la 1 bao tng hop tle ~70 dau bao 
online khac, bao g6m cac chuyén muc : 


Moi nhat: tong hop tat ca cdc bai bao tly cac ngu6n bao 
Tho Su 
Thé Thao 
Phap luat 
Giai tri 
Tam Sw 
Cong nghé 
Kinh té 
Giao duc 
Suc khoé 
Kham pha 
Xe CO 


Game 
Cam nang 
Céng dong 


Cac chuyén muc nay cé thé thay ddi dwa theo nhu cau van hanh san 
pham. Cac bai bao tly cac ngudn sé duoc chia vao cac chuyén muc 
trén. 


Cac ngu6n bao sé duco’c tu déng quét bai mdi lién tuc. Cac link bai 
bao moi duoc lay theo cac chuyén muc chinh 
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. Cac module chinh 

Module crawler: Tu d6ng dinh ki quét cac chuyén muc cua cac 
nguon bao, cac link bai bao méi sé duoc lay tle cac chuyén muc cua 
bao. 


Vi du voi bao Dan Tri: Tw link chuyén muc xa hoi 


httos://dantri.com.vn/xa-hoi.htm sé tim ra cac link bai bao -> tu danh 
sach link bai bao sé crawler thong tin chi tiét cla mdi bai, nhtrng link 
da duoc crawler thanh cong sé duoc luu trix vao SSDB dé tranh 
crawler lai bai bao & nhivng lan quét di liéu sau. 

Cac bai bao sé lay cac théng tin: 

1. Title bai bao 

2. Thumb bai bao 

3. Sapo miéu ta vé bai bao. 

4. Ndi dung bai bao ( the html chinh chua main content), & bud'c 
crawler nay sé chua parse html. 

5. Thoi gian publish bai bao ---> théng tin nay rat quan trong vi sé la 
score cUla bai bao luén, bai nao mdi xuat ban sé duce hién thi & phia 
trén. 


Cac link anh cUia bai bao bao gdm thumb + anh trong bai sé duoc 
gtri event ( bang rabbitMQ) cho 1 service download anh riéng. 


Cac néi dung chi tiét crawler bai bao sé duoc luu tri trong mysal. 


Module Import di liéu: Tw déng va dinh ki lay nhing bai bao moi 
crawler vé tt mysql, parser cdc thé html trong content bai bao, sau 
do insert vao reddis. Thoi gian expire trong redis la 2 tuan ( thoi gian 
nay diéu chinh phu thu6éc vao muén bai bao t6n tai trong hé thong 
bao lau). 

Méi bai bao sé duoc luu bang 1 key trong redis. Ngoai ra théng tin 
bai bao con duoc thém vao cac sorted list theo cac chuyén muc cua 
nguon bao. 


Module DownloadImage: nhan event thong tin cac anh can 
download ttr module crawler thong qua rabbitMQ. Tién hanh 
download va luu xuéng 6 cleng. Nhi?ng file thumb thi sé resize kich 
thu. 


Module VideoService: service dé lay lai link video trong bai vd’ 
nhivng trang ho dé expire time video, phan video nay la phién phtrc 
nhat va cac bén lién tuc thay d6éi cac céng nghé vé play video. Voi 
nhieng video ngudn tte youtube trong bai bao thi hé théng dang ko xt 
ly. 


Module api service: cung cap phuong thc dé client lay di liéu. 
Truc tiép lay dé liéu tty redis. Cac api goi ln server déu cé param 
“db24h” param nay duoc generate dua trén thuat toan cua google 
authen. Muc tich la dé cac bén tht? 3 khéng thé lay lai duoc néi dung 
bao cua Doc bao 24h. Cac api quan trong: 


1. /docbao24h/api/v1.0/website : tra vé théng tin config danh sach 
cac dau bao dang cé trong hé thong. 


2. /docbao24h/api/v1.0/articles : tham s6 la websitelD + topiclD — 
tra vé danh sach cac bai bao trong cling chuyén muc cua dau bao. 


3. /docbao24h/api/v1.0/articles/info : tham sé la articlelID — Tra vé 
ndi dung chi tiét cla bai bao. 
4. /docbao24h/api/v1.0/articles/relative : tham sé la articlelD — tra 


vé danh sach bai bao lién quan cla 1 bai bao . 


5. /docbao24h/api/v1.0/articles/tintaitro: tra vé thong tin config 
nhivng quang cao ctia bén tht 3 dang trién khai trén app. 


